Technical Report: CSVM format for scientific tabular data
نویسندگان
چکیده
The CSVM (CSV with metadata) is issued from CSV format and used for storing experimental data, models, specifications. CSVM allows the storage of tabular data with a limited but extensible amount of metadata. This increases the exchange and long term use of RAW data because all information needed to use subsequently the data are included in the CSVM file. Basic CSVM files are readable by current tools (i.e. spreadsheets) for handling tables. Using full possibilities of concept, it is possible to deviate from a strict table and annotate also inside the data block. CSVM file are ASCII files and could provide a template for implementing best practices in handling RAW data, in exchange and normalization, in long term resources, or in collaborative processes. In this document we describe the first (CSVM-1) release of CSVM format.
منابع مشابه
Technical report: CSVM dictionaries
CSVM (CSV with Metadata) is a simple file format for tabular data. The possible application domain is the same as typical spreadsheets files, but CSVM is well suited for long term storage and the inter-conversion of RAW data. CSVM embeds different levels for data, metadata and annotations in human readable format and flat ASCII files. As a proof of concept, Perl and Python toolkits were designe...
متن کاملTechnical Report: CSVM Ecosystem
The CSVM format is derived from CSV format and allows the storage of tabular like data with a limited but extensible amount of metadata. This approach could help computer scientists because all information needed to uses subsequently the data is included in the CSVM file and is particularly well suited for handling RAW data in a lot of scientific fields and to be used as a canonical format. The...
متن کاملOn the communication of scientific data: The Full-Metadata Format
In this paper, we introduce a scientific format for text-based data files, which facilitates storing and communicating tabular data sets. The so-called Full-Metadata Format builds on the widely used INI-standard and is based on four principles: readable self-documentation, flexible structure, fail-safe compatibility, and searchability. As a consequence, all metadata required to interpret the ta...
متن کاملClustered Support Vector Machines
In many problems of machine learning, the data are distributed nonlinearly. One way to address this kind of data is training a nonlinear classifier such as kernel support vector machine (kernel SVM). However, the computational burden of kernel SVM limits its application to large scale datasets. In this paper, we propose a Clustered Support Vector Machine (CSVM), which tackles the data in a divi...
متن کاملEpi Archive: automated data collection of notifiable disease data
Introduction Most countries do not report national notifiable disease data in a machine-readable format. Data are often in the form of a file that contains text, tables and graphs summarizing weekly or monthly disease counts. This presents a problem when information is needed for more data intensive approaches to epidemiology, biosurveillance and public health as exemplified by the Biosurveilla...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1207.5711 شماره
صفحات -
تاریخ انتشار 2012